OCR Xpress for Linux Functionality

OCR Xpress for Linux is an SDK designed for document recognition applications. OCR Xpress for Linux performs the task of recognizing printed characters in a digital image. The digital image can be captured into a BMP file format using scanners, cameras, or fax machines. Any uncompressed BMP file can be loaded into OCR Xpress for Linux for processing without any pre-processing requirements. Once the image is loaded, the application has three processing choices; to generate a searchable PDF file, to generate a text file, or to create a hierarchically structured data model of the image.

Figure 1 shows how OCR Xpress for Linux is laid out. The API manages three forms of interaction.

Figure 1: OCR Xpress for Linux Model

To generate a PDF file, the API will use the PDF Renderer. In the PDF Renderer an OCR operation will be performed on the image to generate text data. The resultant text data is then used to build a PDF file and finally the image is overlaid on top of the text to produce a searchable PDF file. This is all done with just three API calls; OCRX_load_file() to load the image into OCR Xpress, OCRX_recognize_to_file() to generate the PDF file, and OCRX_free_dib() to free the memory associated with the image.

To generate a text file, the API will use the Memory Rendered and the Internal Structured Data to extract the text data for the image. Once again, this process is accomplished with just three API calls; OCRX_load_file(), OCRX_recognize_to_file(), and OCRX_free_dib().

The third form of API interaction is for applications that need to dig deeper into the spatial relationships of the text data in the image. For example, if an application needs to calculate how far a certain word in one text line is from another word in another text line, it would use this level of API interaction to retrieve the areas/locations of the words in question.
OCR Xpress for Linux constructs and maintains a hierarchical model of the OCR results. Every character found in the image is related to every other character in the image with the hierarchically structured data model. The Internal Structured Data block in Figure 1 represents this hierarchical structured data model. Note that the model is organized into pages, regions, and text blocks. Figure 2 shows how text blocks are organized even further into text lines, words, and characters.

Figure 2: Organization of text lines, words, and characters in a text block

An application has access to all this structured data via the Results Manager. The “Results Manager and Get Functions” of the “How To” section goes into the details of how to access and use this data. In order to construct the Internal Structured Data, the application just has to make two calls to the API; OCRX_load_file() and OCRX_recognize_to_memory(). The application would then use the OCRX_get_xxxx() functions to interrogate the Internal Structured Data and to retrieve data from it.

Once the Internal Structured Data is constructed, it remains persistent until a OCRX_free_document_result() call is made.